Self-training (machine learning)

2024-10-21

A variant of self-supervised learning that is particularly useful when all of the following conditions are true:¹

The ratio of unlabeled examples to labeled examples in the dataset is high.
This is a classification model problem.

Self-training works by iterating over the following two steps until the model stops improving:

Use supervised learning to train a model on the labeled examples.
Use the model created in Step 1 to generate predictions (labels) on the unlabeled examples, moving those in which there is high confidence into the labeled examples with the predicted label.

Notice that each iteration of Step 2 adds more labeled examples for Step 1 to train on.

Footnotes

developers.google.com/machine-learning/glossary#self-training ↩